Handling of Numeric Ranges with the Subdue System
نویسندگان
چکیده
Graph-based knowledge discovery has become a powerful tool in the machine learning and data mining areas. It provides a flexible and natural data representation to describe real world domains. In this research work we present a novel algorithm for graph-based approaches to deal with numerical attributes during the data processing phase implemented in the Subdue system. Our experimental results show that the use of numerical attributes increased classification accuracy in the Mutagenesis and PTC domains in 22% compared to the Subdue system when it does not use our numerical attributes handling approach. Our method also outperforms other author’s results for the same domains, around 7% for the Mutagenesis domain and around 17% for the PTC domain. Recently, artificial intelligence techniques have become tools to solve real world problems and the need to represent real world domains has increased. Many interesting real world domains have an inherently structured character for which we need to find data representations to effectively apply AI algorithms to them. Data domains maybe classified as flat, sequential, and structural according to their properties. Some of these domains contain important numeric attributes. Domains with continuous values are not appropriately manipulated by graph-based knowledge discovery systems, although they can be appropriately represented. To the best of our knowledge at the time of publishing this work a graph-based knowledge discovery algorithm that deals with continuous values attributes does not exist. A solution proposed in the literature to approach this problem is the use of discretization techniques as a pre-processing or post-processing step but not during the knowledge discovery phase. Adding this capacity to graph-based algorithms will allow us to improve the work with numeric attributes; in this way we will be able to improve the classification accuracy for the classification task and the patterns descriptive power.
منابع مشابه
A new Approach for Handling Numeric Ranges for Graph-Based Knowledge Discovery
Discovering interesting patterns from structural domains is an important task in many real world domains. In recent years, graph-based approaches have demonstrated to be a straight forward tool to mine structural data. However, not all graph-based knowledge discovery algorithms deal with numerical attributes in the same way. Some of the algorithms discard the numeric attributes during the prepr...
متن کاملHandling of Numeric Ranges for Graph-Based Knowledge Discovery
Nowadays, graph-based knowledge discovery algorithms do not consider numeric attributes (they are discarded in the preprocessing step, or they are treated as alphanumeric values with an exact matching criterion), with the limitation to work with domains that do not have this type of attribute or finding patterns without numeric attributes. In this work, we propose a new approach for the numeric...
متن کاملFinding the Most Descriptive Substructures in Graphs with Numeric Labels
Many graph datasets are labelled with numeric attributes. Frequent substructure discovery algorithms usually ignore these attributes; in this paper we show that they can be used to improve discrimination and search performance. Our thesis is that the most descriptive substructures are those which are normative both in terms of their structure and in terms of their numeric values. We propose an ...
متن کاملReliability Measures Improvement and Sensitivity Analysis of a Coal Handling Unit for Thermal Power Plant
The present paper investigates the reliability and sensitivity analysis of a coal handling unit of a thermal power plant using a probabilistic approach. Coal handling unit is the main block of a thermal power plant and it is necessary for a good function of a power plant that its power supply, which is dealt in coal handling unit, must function continuously without any obstacle. The configurati...
متن کاملA Comparative Analysis of TLCD-Equipped Shear Buildings under Dynamic Loads
This study targets the behavior of shear buildings equipped with tuned liquid column dampers (TLCD) which attenuate dynamic load-induced vibrations. TLCDs are a passive damping system used in tall buildings. This kind of damper has proven to be very efficient, being an excellent alternative to mass dampers. A dynamic analysis of the structure-damper system was made using the software DynaPy, de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011